Speech segregation based on sound localization.
نویسندگان
چکیده
At a cocktail party, one can selectively attend to a single voice and filter out all the other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel, supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial localization cues: interaural time differences (ITD) and interaural intensity differences (IID). Motivated by the auditory masking effect, the notion of an "ideal" time-frequency binary mask is suggested, which selects the target if it is stronger than the interference in a local time-frequency (T-F) unit. It is observed that within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for estimated ITD and IID. For a given spatial configuration, this interaction produces characteristic clustering in the binaural feature space. Consequently, pattern classification is performed in order to estimate ideal binary masks. A systematic evaluation in terms of signal-to-noise ratio as well as automatic speech recognition performance shows that the resulting system produces masks very close to ideal binary ones. A quantitative comparison shows that the model yields significant improvement in performance over an existing approach. Furthermore, under certain conditions the model produces large speech intelligibility improvements with normal listeners.
منابع مشابه
Integrating Monaural and Binaural Cues for Sound Localization and Segregation in Reverberant Environments
The problem of segregating a sound source of interest from an acoustic background has been extensively studied due to applications in hearing prostheses, robust speech/speaker recognition and audio information retrieval. Computational auditory scene analysis (CASA) approaches the segregation problem by utilizing grouping cues involved in the perceptual organization of sound by human listeners. ...
متن کاملAn Active Machine Hearing System for Auditory Stream Segregation
This study describes a binaural machine hearing system that is capable of performing auditory stream segregation in scenarios where multiple sound sources are present. The process of stream segregation refers to the capability of human listeners to group acoustic signals into sets of distinct auditory streams, corresponding to individual sound sources. The proposed computational framework mimic...
متن کاملPrecedence based speech segregation in bilateral cochlear implant users.
The precedence effect (PE) enables the perceptual dominance by a source (lead) over an echo (lag) in reverberant environments. In addition to facilitating sound localization, the PE can play an important role in spatial unmasking of speech. Listeners attending to binaural vocoder simulations with identical channel center frequencies and phase demonstrated PE-based benefits in a closed-set speec...
متن کاملSpatial Hearing Algorithms Based on Binaural Zero-Crossings: Sound Source Localization, Segregation, and Dereverberation
This thesis concerns a new zero-crossing-based binaural model for spatial hearing. Conventional binaural model computes cross-correlations of binaural signals for the estimation of the interaural time difference which is a primary spatial cue. However, the cross-correlationbased binaural processing model requires high computational complexity and suffers from inaccuracies in localizing sound so...
متن کاملDifferent Profiles of Verbal and Nonverbal Auditory Impairment in Cortical and Subcortical Lesions
A B S T R A C T Introduction:We investigated differential role of cortical and subcortical regions in verbal and non-verbal sound processing in ten patients who were native speakers of Persian with unilateral cortical and/or unilateral and bilateral subcortical lesions and 40 normal speakers as control subjects. Methods: The verbal tasks included monosyllabic, disyllabic dichotic and diotic tas...
متن کاملReliability of Interaural Time Difference-Based Localization Training in Elderly Individuals with Speech-in-Noise Perception Disorder
Background: Previous studies have shown that interaural-time-difference (ITD) training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV). We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- The Journal of the Acoustical Society of America
دوره 114 4 Pt 1 شماره
صفحات -
تاریخ انتشار 2003